{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding Deep Learning Decisions in Statistical Downscaling Models\n",
    "### 10th International Conference on Climate Informatics 2020\n",
    "### Jorge Baño-Medina\n",
    "\n",
    "\n",
    "GitHub repository at https://github.com/SantanderMetGroup/DeepDownscaling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook reproduces the results presented in the paper *Understanding deep learning decisions in statistical downscaling models* by *Jorge Baño-Medina*, in the *10th International Conference on Climate Informatics 2020* (https://doi.org/10.5194/gmd-2019-278). The code developed herein address the computation of saliency maps relying on the prediction difference analysis technique to gain understanding about deep learning models regarding the application of statistical downscaling. In particular we provide saliency maps providing information regarding the downscaling of both precipitation and surface air temperature. The technical specifications of the machine can be found at the end of the notebook. **The notebook takes around 3-4 weeks to fully reproduce the results**.\n",
    "\n",
    "**Note:** This notebook is written in the free programming language `R`(version 3.6.1) and builds on the `R` framework [`climate4R`](https://github.com/SantanderMetGroup/climate4R) (C4R hereafter, conda and docker installations available), a suite of `R` packages developed by the [Santander Met Group](http://meteo.unican.es) for transparent climate data access, post processing (including bias correction and downscaling) and visualization. The interested reader is referred to [Iturbide et al. 2019](https://www.sciencedirect.com/science/article/pii/S1364815218303049?via%3Dihub)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Loading libraries\n",
    "\n",
    "All the working steps rely on the [climate4R](https://github.com/SantanderMetGroup/climate4R) (conda and docker installations available), which is a set of libraries especifically developed to handle climate data (`loadeR`,`loadeR.2nc`,`transformeR`,`downscaleR`,`visualizeR` and `climate4R.value`). In this study, [C4R](https://github.com/SantanderMetGroup/climate4R) is used for loading and post-processing, downscaling, validation and visualization. Different sectorial [notebooks](https://github.com/SantanderMetGroup/notebooks) are available illustrating the use of C4R functions. \n",
    "\n",
    "Deep learning models are included as an extension of the downscaleR package: [`downscaleR.keras`](https://github.com/SantanderMetGroup/downscaleR.keras) which integrates *keras* in the C4R framework. There is also a specific function devoted to the computation of saliency maps to provide interpretability of deep learning models.\n",
    "\n",
    "To install the associated C4R libraries you can proceed with the devtools package. Instructions can be found in the [climate4R](https://github.com/SantanderMetGroup/climate4R) github repository."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(magrittr)\n",
    "library(loadeR)  # version 1.6.1 \n",
    "library(transformeR) # version 1.7.4\n",
    "library(downscaleR.keras)# version 1.0.0 (relies on keras version 2.2.2 and tensorflow version 2.0.0)\n",
    "library(visualizeR) # version 1.5.1\n",
    "library(RColorBrewer)\n",
    "library(sp)\n",
    "library(gridExtra)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Loading data\n",
    "\n",
    "We rely on ERA-Interim and E-OBS as the predictor and predictand datasets in perfect conditions. In particular we consider a set of 20 large-scale predictors (see the `variables` R object in the following code chunk). All these data can be loaded remotely from the [Santander Climate Data Service](http://meteo.unican.es/cds) (register [here](http://meteo.unican.es/udg-tap/signup) freely to get a user), which provides access to various kinds of climate datasets (global and regional climate models, reanalysis, observations...). We will use the C4R packages [`loadeR`](https://github.com/SantanderMetGroup/loadeR) and [`transformeR`](https://github.com/SantanderMetGroup/transformeR) to load and postprocess the required information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "loginUDG(username = \"\", password = \"\") # login into the Santander CDS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We find the label associated to ERA-Interim via the `UDG.datasets()` function of loadeR: “ECMWF_ERA-Interim-ESD”. Then we load the predictors by calling `loadGridData` from loadeR. We use the period indicated in VALUE: 1979-2008."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "variables <- c(\"z@500\",\"z@700\",\"z@850\",\"z@1000\",\n",
    "               \"hus@500\",\"hus@700\",\"hus@850\",\"hus@1000\",\n",
    "               \"ta@500\",\"ta@700\",\"ta@850\",\"ta@1000\",\n",
    "               \"ua@500\",\"ua@700\",\"ua@850\",\"ua@1000\",\n",
    "               \"va@500\",\"va@700\",\"va@850\",\"va@1000\")\n",
    "x <- lapply(variables, function(x) {\n",
    "  loadGridData(dataset = \"ECMWF_ERA-Interim-ESD\",\n",
    "               var = x,\n",
    "               lonLim = c(-10,32),\n",
    "               latLim = c(36,72), \n",
    "               years = 1979:2008)\n",
    "}) %>% makeMultiGrid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Saliency Maps\n",
    "\n",
    "In this section we compute the saliency maps based on the prediction difference analysis technique. These can be directly computed with the `relevanceMaps` function of the `downscaleR.keras` library.\n",
    "\n",
    "In this study we evaluate interpretable maps for the following predictand gridpoints:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Paris <- c(2.25,48.75)\n",
    "Rome <- c(12.25,42.25)\n",
    "Copenhagen <- c(12.25,55.75)\n",
    "Alps <- c(6.75,46.25)\n",
    "stations <- rbind(Paris,Rome,Copenhagen,Alps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1. Precipitation\n",
    "To infer the saliency maps of the downscaling model devoted to precipitation we first download the precipitation variable of the E-OBS dataset with the `loadGridData` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pr <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n",
    "                  var = \"pr\",\n",
    "                  lonLim = c(-10,32),\n",
    "                  latLim = c(36,72), \n",
    "                  years = 1979:2008)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the train period and then train the model with `downscaleTrain.keras`. We save the model in an `.h5` file that will be loaded to compute the saliency maps. This model is a replicate of the `CNN1` model intercompared in the [*Configuration and intercomparison of deep learning neural models for statistical downscaling*](https://www.geosci-model-dev.net/13/2109/2020/gmd-13-2109-2020-discussion.html) paper that provides a [jupyter notebook](https://github.com/SantanderMetGroup/DeepDownscaling/blob/master/2020_Bano_GMD_FULL.ipynb) for reproducibility, and therefore we refer the reader for concrete details regarding this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xT <- subsetGrid(x,years = 1979:2002)\n",
    "xT <- xT %>% scaleGrid(type = \"standardize\")\n",
    "yT <- subsetGrid(y_pr,years = 1979:2002)\n",
    "xyT <- prepareData.keras(xT,\n",
    "                         binaryGrid(gridArithmetics(yT,0.99,operator = \"-\"),\n",
    "                                    condition = \"GE\",\n",
    "                                    threshold = 0,\n",
    "                                    partial = TRUE),\n",
    "                         first.connection = \"conv\",\n",
    "                         last.connection = \"dense\",\n",
    "                         channels = \"last\")\n",
    "inputs <- layer_input(shape = c(getShape(xT,\"lat\"),getShape(xT,\"lon\"),getShape(xT,\"var\")))\n",
    "l0 = inputs\n",
    "l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l3 = layer_conv_2d(l2,filters = 1, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l4 = layer_flatten(l3)\n",
    "l51 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'sigmoid') \n",
    "l52 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n",
    "l53 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n",
    "outputs <- layer_concatenate(list(l51,l52,l53))      \n",
    "model <- keras_model(inputs = inputs, outputs = outputs) \n",
    "downscaleTrain.keras(obj = xyT,\n",
    "                     model = model,\n",
    "                     clear.session = TRUE,\n",
    "                     compile.args = list(\"loss\" = bernouilliGammaLoss(last.connection = \"dense\"),\n",
    "                                         \"optimizer\" = optimizer_adam(lr = 0.0001)),\n",
    "                     fit.args = list(\"batch_size\" = 100,\n",
    "                                     \"epochs\" = 1000,\n",
    "                                     \"validation_split\" = 0.1,\n",
    "                                     \"verbose\" = 1,\n",
    "                                     \"callbacks\" = list(callback_early_stopping(patience = 30),\n",
    "                                                        callback_model_checkpoint(filepath=paste0('CNN_pr.h5'),\n",
    "                                                                                  monitor='val_loss', save_best_only=TRUE))))\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we have trained the model, we subset the test period (i.e., 2008) and compute the rest of the parameters required as input by the `relevanceMaps` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xt <- scaleGrid(subsetGrid(x,years = 2008),\n",
    "                subsetGrid(x,years = 1979:2002),\n",
    "                type = \"standardize\") %>% redim(drop = TRUE)\n",
    "C4R.template <- subsetGrid(y_pr,years = 1979:2002)\n",
    "xy <- prepareData.keras(subsetGrid(x,years = 2008),\n",
    "                        subsetGrid(y_pr,years = 1979:2002),first.connection = \"conv\",\n",
    "                        last.connection = \"dense\",\n",
    "                        channels = \"last\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We call the `relevanceMaps` function that will provide a saliency map for every day of the test period. The parameter `parch` indicates that the input features will be marginalized by channel (i.e., providing the influence of every input variable independently). Their influence will be measured over the stations chosen, input in the function via the `outputCoords` parameter, by computing the difference in the expectance of a Bernouilli-Gamma distribution (see the `loss` parameter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rm(x,y_pr,xT,yT,xyT) # to free memory\n",
    "gc() # to free memory\n",
    "objMaps <- relevanceMaps(xt, \n",
    "                         obj = xy,\n",
    "                         C4R.template = C4R.template,\n",
    "                         outputCoords = stations,\n",
    "                         model = list(\"filepath\" = paste0(\"CNN_pr.h5\"), \n",
    "                                      \"custom_objects\" = c(\"custom_loss\" = bernouilliGammaLoss(last.connection = \"dense\"))),\n",
    "                         loss = \"bernouilliGammaLoss\",\n",
    "                         parch = \"channel\",\n",
    "                         k=1,l=5,num_samples = 10)\n",
    "attr(objMaps$Variable,\"longname\") <- objMaps$Variable$varName\n",
    "objMaps$Variable$level <- objMaps$Variable$level*NA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define a new function `fig` that will visualize the saliency maps by relying on the plotting functions of library `visualizeR`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig <- function(objMaps) {\n",
    "  cb <- brewer.pal(n = 9, \"PuBu\") \n",
    "  cb[1:2] <- \"#FFFFFF\" ; cb <- cb %>% colorRampPalette()\n",
    "  lapply(1:getShape(objMaps,dimension = \"member\"), FUN = function(z){\n",
    "    grid <- subsetGrid(objMaps,members = z) \n",
    "    grid$Data <- abs(grid$Data) \n",
    "    grid <- grid %>% climatology()\n",
    "    seqPlot <- seq(0,abs(max(grid$Data)+0.01*max(grid$Data)), length.out = 25)\n",
    "    attr(grid,\"memberCoords\")$x <- attr(objMaps,\"memberCoords\")$x[z]\n",
    "    attr(grid,\"memberCoords\")$y <- attr(objMaps,\"memberCoords\")$y[z]\n",
    "    spatialPlot(grid,backdrop.theme = \"coastline\",\n",
    "                col.regions = cb,\n",
    "                at = seqPlot,\n",
    "                set.min = seqPlot[1], set.max = seqPlot[length(seqPlot)],\n",
    "                colorkey = TRUE,\n",
    "                sp.layout = list(list(SpatialPoints(attr(grid,\"memberCoords\")), first = FALSE,\n",
    "                                      col = \"red\", pch = 15, cex = 0.75))\n",
    "    )\n",
    "  })\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we call the above defined function and save the saliency maps in a `.pdf` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f <- fig(objMaps)\n",
    "pdf(file = paste0(\"fig_pr.pdf\"),width = 20,height = 20)\n",
    "grid.arrange(grobs = f, ncol = 2)\n",
    "dev.off() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Temperature\n",
    "In this section we infer the saliency maps of the downscaling model of temperature. As done in the last section with precipitation, we load the surface air temperature of the E-OBS dataset with `loadGridData`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_tas <- loadGridData(dataset = \"E-OBS_v14_0.50regular\",\n",
    "                     var = \"tas\",\n",
    "                     lonLim = c(-10,32),\n",
    "                     latLim = c(36,72), \n",
    "                     years = 1979:2008)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we repeat the steps to compute the saliency maps done with the downscaling model of precipitation: train the model, compute the maps and save the plots in a `.pdf` file. Note that the only exception, is that in this case the last hidden layer consists on 10 feature maps and that the loss function is the negative log-likelihood of a gaussian distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xT <- subsetGrid(x,years = 1979:2002)\n",
    "xT <- xT %>% scaleGrid(type = \"standardize\")\n",
    "yT <- subsetGrid(y_tas,years = 1979:2002)\n",
    "xyT <- prepareData.keras(xT,yT,\n",
    "                         first.connection = \"conv\",\n",
    "                         last.connection = \"dense\",\n",
    "                         channels = \"last\")\n",
    "inputs <- layer_input(shape = c(getShape(xT,\"lat\"),getShape(xT,\"lon\"),getShape(xT,\"var\")))\n",
    "l0 = inputs\n",
    "l1 = layer_conv_2d(inputs,filters = 50, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l2 = layer_conv_2d(l1,filters = 25, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l3 = layer_conv_2d(l2,filters = 10, kernel_size = c(3,3), activation = 'relu', padding = \"valid\")\n",
    "l4 = layer_flatten(l3)\n",
    "l51 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n",
    "l52 = layer_dense(l4,units = ncol(xyT$y$Data), activation = 'linear') \n",
    "outputs <- layer_concatenate(list(l51,l52))      \n",
    "model <- keras_model(inputs = inputs, outputs = outputs) \n",
    "downscaleTrain.keras(obj = xyT,\n",
    "                     model = model,\n",
    "                     clear.session = TRUE,\n",
    "                     compile.args = list(\"loss\" = gaussianLoss(last.connection = \"dense\"),\n",
    "                                         \"optimizer\" = optimizer_adam(lr = 0.0001)),\n",
    "                     fit.args = list(\"batch_size\" = 100,\n",
    "                                     \"epochs\" = 1000,\n",
    "                                     \"validation_split\" = 0.1,\n",
    "                                     \"verbose\" = 1,\n",
    "                                     \"callbacks\" = list(callback_early_stopping(patience = 30),\n",
    "                                                        callback_model_checkpoint(filepath=paste0('CNN_tas.h5'),\n",
    "                                                                                  monitor='val_loss', save_best_only=TRUE))))\n",
    "\n",
    "\n",
    "xt <- scaleGrid(subsetGrid(x,years = 2008),\n",
    "                subsetGrid(x,years = 1979:2002),\n",
    "                type = \"standardize\") %>% redim(drop = TRUE)\n",
    "C4R.template <- subsetGrid(y_tas,years = 1979:2002)\n",
    "xy <- prepareData.keras(subsetGrid(x,years = 2008),\n",
    "                        subsetGrid(y_tas,years = 1979:2002),first.connection = \"conv\",\n",
    "                        last.connection = \"dense\",\n",
    "                        channels = \"last\")\n",
    "rm(x,y_tas,xT,yT,xyT) # to free memory\n",
    "gc() # to free memory\n",
    "objMaps <- relevanceMaps(xt, \n",
    "                         obj = xy,\n",
    "                         C4R.template = C4R.template,\n",
    "                         outputCoords = stations,\n",
    "                         loss = \"gaussianLoss\",\n",
    "                         model = list(\"filepath\" = paste0(\"CNN_tas.h5\"), \n",
    "                                      \"custom_objects\" = c(\"custom_loss\" = gaussianLoss(last.connection = \"dense\"))),\n",
    "                         parch = \"channel\",\n",
    "                         k=1,l=5,num_samples = 30)\n",
    "attr(objMaps$Variable,\"longname\") <- objMaps$Variable$varName\n",
    "objMaps$Variable$level <- objMaps$Variable$level*NA\n",
    "f <- fig(objMaps)\n",
    "pdf(file = paste0(\"fig_tas.pdf\"),width = 20,height = 20)\n",
    "grid.arrange(grobs = f, ncol = 2)\n",
    "dev.off() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Technical especifications\n",
    "To perform all the stages involved in this study we relied on the machine described below.\n",
    "- Machine (HP-ProDesk-600-G2-MT)\n",
    "  - Operating system: ubuntu 4.15.0-72-generic\n",
    "  - Memory: 15.6 GiB\n",
    "  - Processor: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8\n",
    "  - SO: 64 bits\n",
    "  - Disc: 235.1 GiB"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
